+ Reply to Thread + Post New Thread
Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: need some help saving captcha[VB.NET]

  1. #1
    Noobie
    Join Date
    Apr 2008
    Posts
    81
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default need some help saving captcha[VB.NET]

    Well I got the Decaptcher API working, but can't save the captcha image, when i try to load it to picturebox1 with any of this codes
    Code:
    PictureBox1.Load(WebBrowser1.Document.GetElementById("captcha").Parent.Parent.GetElementsByTagName("img")(0).GetAttribute("src"))
    or
    Code:
     * For Each ImgElement As HtmlElement In WebBrowser1.Document.Images
    ** * * * * *Dim b = ImgElement.GetAttribute("SRC")
    ** * * * * *If b.Contains("/captcha/") Then
    ** * * * * * * *PictureBox1.Load(b)
    ** * * * * *End If
    ** * * *Next
    this 2 codes getting the src of the image but I see different captcha than the one i see at the webbrowser1
    (it works with FF or IE to go to the SRC and see same captcha)
    any way i think picturebox1.load fucks it up is there a way to download right away from webbrowser1? make it work, how?

    I just need to have the captcha in picturebox1 (then i save it)
    or just save it (c:\blabla\captcha.jpg)

  2. Shorten URL    SEO Services    Buy Xrumer

    Sponsored Links

  3. #2
    Noobie
    Join Date
    Apr 2008
    Posts
    23
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    You need to pass the cookies

  4. #3
    Noobie
    Join Date
    Apr 2008
    Posts
    81
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    how?

  5. #4
    Noobie
    Join Date
    Apr 2008
    Posts
    12
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    My guess is that it will not help to pass the cookies. Each request to the web server returns its own image, new image every request.

    Here is my suggestion:
    - my guess is your crawler is based on MSHTML (WebBrowser obj in .NET is using MSHTML internally)
    - so go to your IE Settings and DISABLE IMAGE DOWNLOADING
    - leave the rest of your code the way it is.

    So now the very first request for SRC will be from PicureBox1.Load(b), so you will get the same image as you would get in your browser.


    They MAY sent cookies together with image and may expect you to send them those cookies back. If my suggestion above will not work, you will hav to use Proxy Server and log all the communications and see what kind of cookies/requests is travelling back and forth

    HTH

  6. #5
    Noobie
    Join Date
    Apr 2008
    Posts
    81
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    I tried disable images didn't work
    the proxy thing sounds hard and I'm just a beginner with vb.net
    is there an easier way? like screen capture? or i heard about this command (WebBrowser1.document.Images(0)) not sure how to use it, I tried to check Temp folder for captcha image but nothing ;/
    lol why it's so hard to get this captcha image and so easy to get yahoo's ;[

  7. #6
    Noobie
    Join Date
    Apr 2008
    Posts
    12
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    donload fiddler from microsoft (proxy server for web developers) and learn how to use it, since you have your "real task at hand"

    So run fiddler and then visit your page with captcha as the regular user would do. Type in and submit the captcha.

    Then switch to fiddler and carefully examine HTTP requests / responses

    You will need to see how IE requests the captcha image when real human user visits the page. Things to look for:
    - what is exact HTTP request for captcha (referrer header? any cookies?)
    - what is the server response (besides the image binary data, is there any new cookies being sent in the response headers, etc)

    Well if proxy is difficult for you, screen capturing is even more difficult

  8. #7
    Noobie
    Join Date
    Apr 2008
    Posts
    81
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Thanks, going to do it now will report back
    edit: yeah got cookie
    here is the RAW from the url of the captcha
    HTML Code:
    GET http://www.website.com/validator/456/1268451159.gif HTTP/1.1
    Accept: */*
    Referer: http://www.website.com/file/files/456/3338/
    Accept-Language: en-US
    User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET CLR 1.1.4322; .NET CLR 4.0.20506)
    Accept-Encoding: gzip, deflate
    Host: www.website.com
    Connection: Keep-Alive
    Cookie: uid=5564236; uhsh=%25A8%25CDfG%2501%2529%25A3%2527%25E4%25D6%2560%25E1%2592%250D%2529%25A7%258D%25A8_75ff9de3c93f220264a5d49ad7842a11b; lkni=1430774893; UserInteraction4=KonaBase; vc23835752=7653443762118; __utma=140781479.284486472.1268445511.1268445511.1268451162.2; __utmb=140781479.3.10.1268451162; __utmz=140781479.1268445511.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmc=140781479
    but honestly i have no idea what to do with all of the information I know how to use webbrowser1 only no httprequest (using vb.net for like 7-8 days now)
    p.s can i use something like "WebBrowser.Document.Cookie"

  9. #8
    Noobie
    Join Date
    Apr 2008
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    You shouldn't need cookies to snag the captcha image. There are a few ways to do it and I'm not exposing the way Auto-CraigsLister does it because, I don't want reCaptcher to change something and screw it up. You can get a screen shot of the page, just google, "VB web page thumbnail" or "VB web page screen shot". You can take it further by getting the parentOffSet, width and height, of the captcha image and just get a screen shot of that area in the web page. That way is annoying and slow but will do what you need.

    I don't know VB syntax so don't ask me for sample code, I'm a C# guy.

    PS. Don't PM me asking how Auto-CraigsLister does it, because I will never tell!!!! Seriously, don't...


    After looking at the Fiddler response, it isn't reCaptcher so just look for someones "VB image download" code and use the URL http://www.website.com/validator/456/1268451159.gif to just download the captcha image.

  10. #9
    Noobie
    Join Date
    Apr 2008
    Posts
    12
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    even though it is possible, you do NOT need to set "WebBrowser.Document.Cookie" (it is actually called differently, whatever Javascript methods are used to set cookies you could use too through MSHTML. You could inject Javascripts into pages, and much more. But that is besides the point)

    I see 2 possible solutions:

    1. use WebRequest
    ==============

    disable images in IE, and then you could craft a WebRequest like you posted from proxy log, with referrer, cookies etc - everything is easy the only thing you would have to add Request Headers containing cookies. And you would need to READ real cookie values from WebBrowser, smth like this:

    Dim cookieString As String = CType(webBrowserObj.Document, mshtml.IHTMLDocument2).cookie

    If you create a right WebRequest, they will have no way of telling whether it comes from IE or from WebRequest. So they will send you a Response Binary Stream and you will have to save it into file and here you go. The file type (JPG or GIF or whatever) you could hardcode for this site, or you could see the COntentType header on response to see the string smth like "image/jpg" etc and convert it into proper file extension. Those are called MIME Types and those are standard.

    2. IHTMLElementRender
    ==================

    You would need to ENABLE images in IE. Then you could try to use IHTMLElementRender interface on a captcha image DOM node (the one you are taking SRC from), and it should render image bitmap into System Bitmap object and it could be saved from there into file. I have not done it personally so you could do a search on IHTMLElementRenderand try to make it work (could be tough if you are new to this whole thing). I suggest you post a little project on a Free Lancers sites and some guys from India will code IHTMLElementRender solution for you, and it is more universal approach, you dont care about anything, you will be able to render an image from ANY DOM NODE, (including ??? aka Flash???)

    Good luck with your proj

  11. #10
    Noobie
    Join Date
    Apr 2008
    Posts
    12
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    hxxp://www.developerfusion.com/code/4712/generate-an-image-of-a-web-page/

    Just read that page, they have a complete code sample in VB and C#. They make a picture of the whole page though. And you would only need your image node.

    So you could change this line of code:
    Dim element As IHTMLElement = CType(document.body, IHTMLElement)

    to this:
    Dim element As IHTMLElement = CType(ImgElement, IHTMLElement)

    (from your OP I assume ImgElement is the name of your variable that holds the IMG DOM Node with Captcha)

    You should be all set then.
    In fact, I could use that code for my own purposes as well

+ Reply to Thread
Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts